BIC selection procedures in mixed effects models
نویسندگان
چکیده
We consider the problem of variable selection in general nonlinear mixed-e ets models, including mixed-e ects hidden Markov models. These models are used extensively in the study of repeated measurements and longitudinal analysis. We propose a Bayesian Information Criterion (BIC) that is appropriate for nonstandard situations where both the number of subjects N and the number of measurements per subject n tend to in nity. In this case, the consistency rates of the maximum likelihood estimators (MLE) of the parameters depend on the level of variability designed in the model. We show that the MLE of the population parameters related to subject-speci c parameters are √ N -consistent whereas the MLE of the parameters related to xed parameters are √ Nn-consistent. We derive a BIC criterion with a penalty based on two terms proportional to logN and logNn. Finite-sample properties of the proposed selection procedure are investigated by simulation studies. Key-words: Consistency rate, Nonlinear mixed model, Hidden Markov mixed-e ects model, Variable selection. ∗ Laboratoire de Mathématiques, Université Paris-Sud, France & Popix, Inria Saclay Ile-de-France ha l-0 06 96 43 5, v er si on 1 11 M ay 2 01 2 Procédures de sélection de variables de type BIC dans les modèles à e ets mixtes Résumé : Nous nous intéressons au problème de la sélection de variables dans des modèles non-linéaires mixtes généraux, incluant les modèles de Markov cachés à e ets mixtes. Ces modèles sont très utilisés pour analyser des données répétées ou des données longitudinales. Nous proposons un critère BIC (Bayesian Information Criterion) adapté à la situation non-standard de double-asymptotique où le nombre de sujets N et le nombre d'observations par sujet n tendent vers l'in ni. Dans cette situation, les vitesses de convergence des estimateurs du maximum de vraisemblance (EMV) des paramètres dépendent des niveaux de variabilité exprimés dans le modèle. Nous montrons que les EMV des paramètres de population liés aux paramètres spéci ques à chaque sujet sont √ N -convergents tandis que les EMV des paramètres liés aux paramètres sans composante aléatoire sont √ Nn-convergents. Nous en déduisons un critère BIC dont la pénalité est formée de deux termes en logN et logNn. Nous illustrons le comportement de la méthode de sélection de variables proposée par une étude de simulations. Mots-clés : Modèle de Markov caché à e ets mixtes, Modèle non-linéaire mixte, Sélection de variables, Vitesses de convergence. ha l-0 06 96 43 5, v er si on 1 11 M ay 2 01 2 BIC selection procedures in mixed e ects models 3
منابع مشابه
A note on BIC in mixed-effects models
The Bayesian Information Criterion (BIC) is widely used for variable selection in mixed effects models. However, its expression is unclear in typical situations of mixed effects models, where simple definition of the sample size is not meaningful. We derive an appropriate BIC expression that is consistent with the random effect structure of the mixed effects model. We illustrate the behavior of...
متن کاملRegression with Multiple Candidate Models: Selecting or Mixing?
Model averaging provides an alternative to model selection. An algorithm ARM rooted in information theory is proposed to combine di erent regression models/methods. A simulation is conducted in the context of linear regression to compare its performance with familiar model selection criteria AIC and BIC, and also with some Bayesian model averaging (BMA) methods. The simulation suggests the foll...
متن کاملBayesian information criterion for longitudinal and clustered data.
When a number of models are fit to the same data set, one method of choosing the 'best' model is to select the model for which Akaike's information criterion (AIC) is lowest. AIC applies when maximum likelihood is used to estimate the unknown parameters in the model. The value of -2 log likelihood for each model fit is penalized by adding twice the number of estimated parameters. The number of ...
متن کاملModel Selection in Linear Mixed Models
Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model with other desirable properties from a possibly very large set of candidate statistical models. Over the last 5–10 years the literature on model selection in l...
متن کاملModel selection strategies for identifying most relevant covariates in homoscedastic linear models
We propose a new method in two variations for the identification of most relevant covariates in linear models with homoscedastic errors. In contrast to AIC, BIC and other information criteria, our method is based on an interpretable scaled quantity. This quantity measures a maximal relative error one makes by selecting covariates from a given set of all available covariates. The proposed model ...
متن کامل